AITopics | overparameterized neural network

Collaborating Authors

overparameterized neural network

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

ae614c557843b1df326cb29c57225459-Paper.pdf

Neural Information Processing SystemsFeb-13-2026, 14:36:03 GMT

In this work, we showthat this "lazy training" phenomenon isnot specific tooverparameterized neural networks, and is due to a choice of scaling, often implicit, that makes the model behave as its linearization around the initialization, thus yielding amodel equivalenttolearning withpositive-definite kernels.

artificial intelligence, machine learning, neural network, (17 more...)

Neural Information Processing Systems

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.30)

Add feedback

Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers

Zeyuan Allen-Zhu, Yuanzhi Li, Yingyu Liang

Neural Information Processing SystemsFeb-12-2026, 09:22:12 GMT

The fundamental learning theory behind neural networks remains largely open. What classes of functions can neural networks actually learn?

artificial intelligence, machine learning, neural network, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Fast Convergence of Natural Gradient Descent for Over-Parameterized Neural Networks

Guodong Zhang, James Martens, Roger B. Grosse

Neural Information Processing SystemsFeb-11-2026, 15:38:25 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, arxivpreprintarxiv, machine learning, (13 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.65)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.41)

Add feedback

Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers

Neural Information Processing SystemsDec-25-2025, 11:37:54 GMT

The fundamental learning theory behind neural networks remains largely open. What classes of functions can neural networks actually learn? Why doesn't the trained network overfit when it is overparameterized? In this work, we prove that overparameterized neural networks can learn some notable concept classes, including two and three-layer networks with fewer parameters and smooth activations. Moreover, the learning can be simply done by SGD (stochastic gradient descent) or its variants in polynomial time using polynomially many samples. The sample complexity can also be almost independent of the number of parameters in the network. On the technique side, our analysis goes beyond the so-called NTK (neural tangent kernel) linearization of neural networks in prior works. We establish a new notion of quadratic approximation of the neural network, and connect it to the SGD theory of escaping saddle points.

learning and generalization, name change, overparameterized neural network, (1 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.61)

Add feedback

Optimization and Bayes: A Trade-off for Overparameterized Neural Networks

Neural Information Processing SystemsDec-24-2025, 09:03:17 GMT

This paper proposes a novel algorithm, Transformative Bayesian Learning (TansBL), which bridges the gap between empirical risk minimization (ERM) and Bayesian learning for neural networks. We compare ERM, which uses gradient descent to optimize, and Bayesian learning with importance sampling for their generalization and computational complexity. We derive the first algorithm-dependent PAC-Bayesian generalization bound for infinitely wide networks based on an exact KL divergence between the trained posterior distribution obtained by infinitesimal step size gradient descent and a Gaussian prior. Moreover, we show how to transform gradient-based optimization into importance sampling by incorporating a weight. While Bayesian learning has better generalization, it suffers from low sampling efficiency. Optimization methods, on the other hand, have good sampling efficiency but poor generalization. Our proposed algorithm TansBL enables a trade-off between generalization and sampling efficiency.

name change, optimization and bayes, overparameterized neural network, (7 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.32)

Add feedback

Initialization Matters: Privacy-Utility Analysis of Overparameterized Neural Networks

Neural Information Processing SystemsDec-23-2025, 22:53:09 GMT

We analytically investigate how over-parameterization of models in randomized machine learning algorithms impacts the information leakage about their training data. Specifically, we prove a privacy bound for the KL divergence between model distributions on worst-case neighboring datasets, and explore its dependence on the initialization, width, and depth of fully connected neural networks. We find that this KL privacy bound is largely determined by the expected squared gradient norm relative to model parameters during training. Notably, for the special setting of linearized network, our analysis indicates that the squared gradient norm (and therefore the escalation of privacy loss) is tied directly to the per-layer variance of the initialization distribution. By using this analysis, we demonstrate that privacy bound improves with increasing depth under certain initializations (LeCun and Xavier), while degrades with increasing depth under other initializations (He and NTK). Our work reveals a complex interplay between privacy and depth that depends on the chosen initialization distribution. We further prove excess empirical risk bounds under a fixed KL privacy budget, and show that the interplay between privacy utility trade-off and depth is similarly affected by the initialization.

initialization matter, overparameterized neural network, privacy-utility analysis, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.30)

Add feedback

Understanding the role of depth in the neural tangent kernel for overparameterized neural networks

St-Arnaud, William, Carvalho, Margarida, Farnadi, Golnoosh

arXiv.org Machine LearningNov-11-2025

Overparameterized fully-connected neural networks have been shown to behave like kernel models when trained with gradient descent, under mild conditions on the width, the learning rate, and the parameter initialization. In the limit of infinitely large widths and small learning rate, the kernel that is obtained allows to represent the output of the learned model with a closed-form solution. This closed-form solution hinges on the invertibility of the limiting kernel, a property that often holds on real-world datasets. In this work, we analyze the sensitivity of large ReLU networks to increasing depths by characterizing the corresponding limiting kernel. Our theoretical results demonstrate that the normalized limiting kernel approaches the matrix of ones. In contrast, they show the corresponding closed-form solution approaches a fixed limit on the sphere. We empirically evaluate the order of magnitude in network depth required to observe this convergent behavior, and we describe the essential properties that enable the generalization of our results to other kernels.

artificial intelligence, machine learning, neural network, (15 more...)

arXiv.org Machine Learning

2511.07272

Country: North America > Canada > Quebec (0.28)

Genre: Research Report > New Finding (0.86)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

690ddbee6eef37933f4be0abeb7aff45-Paper-Conference.pdf

Neural Information Processing SystemsAug-15-2025, 13:28:26 GMT

classifier, neural network, square loss, (13 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

main

Zhuoran Yang

Neural Information Processing SystemsAug-15-2025, 12:02:54 GMT

The classical theory of reinforcement learning (RL) has focused on tabular and linear representations of value functions.

algorithm, arxiv preprint arxiv, neural information processing system, (11 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
North America > Canada (0.04)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Data Science > Data Mining (0.93)

Add feedback

Reviews: On Lazy Training in Differentiable Programming

Neural Information Processing SystemsJan-26-2025, 11:14:55 GMT

The paper provided some interesting understanding, but is not significant enough to explain interesting issues in deep learning. The paper showed that lazy training can be caused by parameter scaling, not special to overparameterization of neural networks. What does this tell us about the overparameterized neural networks? Does this result imply that lazy regime of overparameterized neural networks is necessarily due to parameter scaling? If not, lazy regime of overparameterized neural networks cannot be explained simply by parameter scaling.

differentiable programming, overparameterized neural network, parameter scaling, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback